# Toxicity Scanner

It is designed to assess the toxicity level of the content generated by language models, acting as a safeguard against
potentially harmful or offensive output.

## Attack scenario

Language models, when interacting with users, can sometimes produce responses that may be deemed toxic or inappropriate.
This poses a risk, as such output can perpetuate harm or misinformation. By monitoring and classifying the model's
output, potential toxic content can be flagged and handled appropriately.

## How it works

The scanner uses the [unitary/unbiased-toxic-roberta](https://huggingface.co/unitary/unbiased-toxic-roberta) model from Hugging Face for binary classification of the text as toxic or non-toxic.

- **Toxicity Detection**: If the text is classified as toxic, the toxicity score corresponds to the model's confidence in this classification.
- **Non-Toxicity Confidence**: For non-toxic text, the score is the inverse of the model's confidence, i.e., `1 − confidence score`.
- **Threshold-Based Flagging**: Text is flagged as toxic if the toxicity score exceeds a predefined threshold (default: 0.5).

## Usage

```python
from llm_guard.output_scanners import Toxicity
from llm_guard.output_scanners.toxicity import MatchType

scanner = Toxicity(threshold=0.5, match_type=MatchType.SENTENCE)
sanitized_output, is_valid, risk_score = scanner.scan(prompt, model_output)
```

**Match Types:**

- **Sentence Type**: In this mode (`MatchType.SENTENCE`), the scanner scans each sentence to check for toxic.
- **Full Text Type**: In `MatchType.FULL` mode, the entire text is scanned.

## Optimization Strategies

[Read more](../get_started/optimization.md)

## Benchmarks

Test setup:

- Platform: Amazon Linux 2
- Python Version: 3.11.6
- Input length: 217
- Test times: 5

Run the following script:

```sh
python benchmarks/run.py output Toxicity
```

Results:

| Instance                         | Latency Variance | Latency 90 Percentile | Latency 95 Percentile | Latency 99 Percentile | Average Latency (ms) | QPS      |
|----------------------------------|------------------|-----------------------|-----------------------|-----------------------|----------------------|----------|
| AWS m5.xlarge                    | 2.89             | 154.18                | 181.05                | 202.55                | 100.40               | 2161.43  |
| AWS m5.xlarge with ONNX          | 0.00             | 49.61                 | 49.98                 | 50.28                 | 48.77                | 4449.47  |
| AWS g5.xlarge GPU                | 33.35            | 282.36                | 373.59                | 446.56                | 99.57                | 2179.37  |
| AWS g5.xlarge GPU with ONNX      | 0.01             | 8.00                  | 9.56                  | 10.81                 | 4.85                 | 44719.38 |
| Azure Standard_D4as_v4           | 3.90             | 182.94                | 213.16                | 237.33                | 118.62               | 1829.38  |
| Azure Standard_D4as_v4 with ONNX | 0.07             | 70.81                 | 73.93                 | 76.43                 | 61.40                | 3534.14  |
